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FOREWORD 


This report represents Part IV of a series of reports to be published under 
the same title with the following subtitles: 

Part I : Background 

Part II: Advanced Techniques - The Linear Channel 

Part III: Advanced Techniques ~ The Nonlinear Channel 
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ABSTRACT 


In addition to decoding convolutional codes, the Viterbi algorithm is useful 
in a host of other applications, some of which include: maximum likelihood demodu- 
lation of new bandwidth efficient modulations such as minimum-shift-keying (MSK) 
and continuous phase frequency-shift-keying (CPFSK) , demodulation of intersymbol 
interference and partial response signals, estimation and smoothing, and simul- 
taneous phase synchronization/data detection. Performance bounds for these new 
and exciting applications of the Viterbi algorithm can be obtained by a generali- 
zation of the transfer function approach originally introduced by Viterbi for 
obtaining bit-error probability bounds on the performance of specific convolutional 
codes over specific symmetric channels. In Appendix A we examine the use of the 
Viterbi algorithm in a general context and present the generalized transfer 
function bounds necessary to carry out the applications mentioned above. 

The well-known Chernoff and Bhattacharyya bounds can, under certain condition 
be made tighter than their commonly quoted standard versions by a factor of one- 
half. Using a new approach. Appendix B reviews sufficient conditions under which 
these reductions can occur, at the same time making these conditions less restric- 
tive but also harder to verify. 
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APPENDIX A 


Generalized Transfer Function Bounds 


I. Introduction 

In this appendix we derive performance bounds for the Viterbi algorithm 
used in a general estimation/detection context. Special cases include decoding 
convolutional codes, demodulation of new bandwidth efficient modulations such as 
MSK and CPFSK, demodulation of intersymbol interference signals, estimation and 
smoothing, and simultaneous synchronization/data detection. 

. Our approach is to generalize the transfer function bounds originally used 
to evaluate the bit error probability of binary convolutional codes with binary- 
input output-symmetric memoryless channels (Ref. 1). We begin by describing the 
general estimation/detection problem and the use of the Viterbi algorithm in its 
solution. Next “super state" diagrams are defined and generalized transfer 
function bounds are derived. Special forms of these state diagrams and transfer 
function bounds are then examined. 

II . Discrete-Time System Model 

We assume the discrete-time system shown in Figure A- I, Here the signal 
is described as a general finite state system given by the output 

(A. 2.1) 

and state relation 


^k+l = (A. 2. 2) 

where Uj^, Xj^, and have finite alphabets denoted U, X, and S respectively. 

The sizes of these alphabets are denoted |U|, |X|, and |S|. Note that while |S| 
determines the number of states, lU| determines the number of next state transi- 
tions from a given state. The signal inputs {uj^} are i.i.d. discrete random 
variables with probability function 


q(u), ueU 


A-1 


"k 



SIGNAL CHANNEL VITERBI ALGORITHM 

Figure A-1. Discrete-Time System Model 


The channel or observation is described by 


= h(xj^.nj^) (A. 2. 3) 

where are l.i.d. random variables independent of the signal inputs 

Here and y^ can be continuous or discrete valued. 

The receiver is described by a Viterbi algorithm which uses a metric 

m((Sj^,Uj^) (A. 2. 4) 


for the branches of the trellis diagram. This metric may correspond to many 
possible forms such as: 

(a) Maximum Likelihood (ML) : 


m((s^,Uk),Yk) = logg 


= log^ p(ykl V 


(A. 2. 5a) 




(b) Maximum A Posteriori (MAP) : 




= log^ P^^kl^k^ ■'■ (A. 2. 5b) 


(c) Minimum Mean Square Error (MSE) : 




■ -<5'k - 


(A, 2 . 5c) 


Independent of the metric used by the Viterbi algorithm, we may wish to 
evaluate the overall performance using a distortion measure d((Sj^,Uj^), (Sj^,Uj^)). 
This measure may be any nonnegative function such as: 

(a) Error Distortion: 


\ ^ ^k 

u^ = u^ (A. 2. 6a) 

d((\’V’(Sk’Uk)) = 

for any a, 3 ^ 0 (A. 2. 6b) 

In the usual convolutional coding application of the Viterbi algorithm the ML 
metric is used and the error distortion measure gives the bit error probability 
bound. If, on the other hand, we wish to estimate the phase of a signal that is 
modeled as a Markov chain, the MAP metric might be used in the Viterbi algorithm 
and the mean square error distortion measure would give the resulting mean square 
error bound. Although there is a natural relationship between the metric used 
by the Viterbi algorithm and the distortion measure used for evaluating perform- 
ance, we do not require any connection between these quantities. Indeed, for a 
given metric, we shall consider cases where we evaluate perfotmance in terms of 


' 1 ; 


d((s^,u^) ,(Sk.Uk)) = 


(b) Mean Square Error: 
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two different distortion measures. For example, when a Viterbi algorithm is used 
to simultaneously estimate phase and demodulate data, we would be interested in 
both the mean square phase error and the bit error probability. 

1^1* The Viterbi Algorithm 

Let us assume that the discrete-time system of Figure A-1 begins at t=0 
with initial state Sq known to the receiver. The receiver then uses the channel 
output sequence y^, y^^, to estimate the particular state sequence 

A ^ /V /S 

Sj^, S 2 » ••• or equivalently the particular signal input sequence Uq, Uj^, U 25 

. . . that maximizes the total metric 


00 


k -0 


over all possible sequences ^ • 

The Viterbi algorithm is an optimum algorithm for any additive metric and 
a finite state signal model. The key to understanding this algorithm is the 
trellis diagram description of the signal process. Suppose for example we have 

|Si=4 

|U| = 3 (A. 3.1) 

The state diagram for the signal process might then be as shown in Figure A-2 
where each of 4 nodes denotes a state and there are 3 next state transitions. 

If we were to give a time-sequence of the possible state transitions starting 
with some initial state then we have the corresponding trellis diagram of 
Figure A-3. The key point here is that all possible sequences 
represented by paths in the trellis diagram . 

Suppose now we have a trellis diagram with M states, 

S = {A^,A2, ,A^} (A. 3. 2) 


A-4 




Figure A-2. State Diagram |U[ =3, |S| =4 
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Figure A-3. Trellis Diagram |U| =3, [S] = A 

A typical path is sketched in Figure A-4. As the receiver receives the channel 
output sequence Yq* compute a metric value for each branch or 

transition from state to state along this path. Thus, in this way the total 
metric up to time t = n+1 is 


n 


k=0 


for the particular sequence ^ (Sj^» which defines the path. 
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Figure A-4. Typical Path and Metric 
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The optimum receiver with respect to this additive metric considers all 
paths in the trellis diagram and as t-x» chooses the path which corresponds to the 
maximum total metric. The key feature of the Viterbi algorithm is the elimina- 
tion of paths without loss in optimality whenever two or more paths merge to the 
same state . 

In Figure A-5 we show this elimination of paths characteristic of the 
Viterbi algorithm. Suppose that two paths and {( 

time t = n+1 to state as shown in Figure A-5. Then the metrics 

accumulated up to this point are 

n 

k=0 



s^, merge at 


and 


n 

k=0 

Note that any remaining segment of the two paths starting at state A^ at t = n+1 
can be the same for either initial sequence. Since we are only interested in 
finding any maximum metric sequence, without any loss of optimality we can 
eliminate one of these two initial path sequences from further consideration, 
namely, the one with the smaller accumulated metric. Thus, for example, if 

n n 

,yj^) k »yk^ (A. 3. 3) 

k=0 k=0 

then we can eliminate the initial path sequence further 

consideration. When more than two paths merge to one state we can eliminate all 
but one path from further consideration and keep only the one with the largest 
accumulated metric. 
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Since there are only a finite number of states 


M = |S| , (A. 3. 4) 

at most M paths are retained by the Viterbi algorithm as the channel output 
sequence Vi* ••• is received. This is in contrast to the number of possible 
paths Iu|^ up to time t == n. By eliminating paths that are not maximum metric 
each time paths merge in the trellis, the Viterbi algorithm reduces the computa- 
tion to roughly M rather than an exponential growth with time. 

Another important feature of the Viterbi algorithm is that for all metrics 
of interest, there is negligible loss of optimality associated with making 
final decisions concerning the maximum metric paths at some fixed lag time as 
channel outputs are received. This is illustrated in Figure A-6 where we assume 
that the channel output at time t = is being processed by the Viterbi algorithm 
so that the M surviving paths are computed up to this time. Typically the M 
surviving paths, one of which is the true maximum metric path, share common 
initial parts. By considering a large enough lag time L, then with high 
probability only one initial part remains for all M paths at this lag time 
t = £-L. For convolutional codes the choice (Ref. 1) 

L > 5 log2 M (A.3.5) 

is large enough to guarantee negligible loss in performance. The Viterbi 
algorithm is thus practically realized as a fixed lag estimator of the sequence 
{(Srs ^ that maximizes the total metric 


00 


k=0 


as it receives from the channel the observations ... . 
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Figure A-6. Fix Lag Decisions 
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IV. Error Events 


The performance of the discrete-time system of Figure A-1 is defined by a 
distortion measure 

A. ys 

d((s ,u ) , (s ,u )) 
n n n n 

where (s , u ) is the true signal state and input at time t = n while (s , u ) 
n n n n 

is the state and input selected by the Viterbi algorithm for the same time. 

Without loss in generality, we assume this distortion measure is nonnegative 
and in addition, 

d((s ,u ),(s_,u )) =0 (A. 4.1) 

n n n n 

The condition (s , u ) ^ (s , u ) can only occur when the Viterbi algorithm 
n n n n ® 

eliminates a segment of the true path that includes the state s^. When this 
happens we have an error event which is characterized by, say, times i and j where 
i < n < j and 


s . = s . , s . = s . 

1 1 J . J 

®k i ^ < j 


j-1 j-1 

>yi^) (A. 4. 2) 

k=l k=l 


Figure A-7 illustrates such an error event. 

In general for fixed time t = n, there are many possible error events that 

can lead to the condition (s , u ) (s , u ) . The beginning of an error event 

n n n n 

at time i can be anywhere from t = 0 to t = n while the end of an error event 
can range from t = n+1 to t = «>. In the subsequent analysis we shall upper 
bound our performance by assuming a steady state condition where t = n>>0 is 
assumed so that we allow the initial time of an error event to range from 


A-12 



• • • 


n+1 


• • • 



METRIC: 





Figure A-7. Error Event 


t = ~oo to t = n. This will result in an upper bound on performance since in 
considering error events we include more of these than are necessary. 

We now examine the probability of the occurrence of a particular error 
event. Again let n^)} be the true signal state and input sequence. 

Suppose {(s^* is any other possible state and input sequence where for 

i ^ n < j we have 


s. = 
X 


s . 
J 


s. 

3 




(A. 4. 3) 
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Since the error event only involves subsequences from i to j , we denote these 
as 


^[i. j' 

1 (®i>®i+i* •• 

• • • 5 S j ) 


A[i. j! 

1 = .. 

• • • * ^ j ^ 

(A. 4. 4) 


We now bound the probability of this error event denoted by 


P(£[i»j] A[i.j]) = Pr 


= Pr 








Vk=i 


k=l 


j-l 


^ [m((Sj^,u^) ,y^) - tn((s^,Uj^) ,y^)] >0|£,J^| (A. 4. 5) 


k*x 


where the probability is over the channel noise sequence i ^ k<j . Using 

the Chemoff bound (Ref. 2) with parameter XiO, and noting that the random 
variables ate independent, we have 

j-l 

P(£[i»j] ^lli.jl) < Ejexp [X ^ {>"( (\»\) ’^k^ ^ ^ 

k=i 

j-l 

= ^Ejexp [X{m((Sj^,Uj^) ,y^) - m((Sj^,Uj^) .y^^) }] 1 s.lj 

k=i 

j-l 

~ 1 ’^®k’“k^^ (A. 4. 6) 

k=i 
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where 


D^((s^,Uj^) , (s^.Uj^)) = Ejexp [X{m((§^,u^) ,y^) - m( (s^,u^) ,y^) }] l^.g^ | 

(A. 4. 7) 

The function can, in general, be numerically 

evaluated and has some well-known special cases. In particular, for the ML 
metric of (A. 2. 5a), it can be shown that X = 1/2 almost always minimizes the 
Chernoff bound resulting in the Bhattacharyya bound (Ref. 1) 


D^((Sj^.Uj^).(Sk» 

2 


“k» ■ s yp<yfcl>‘t)p<>'kl 


*k> 


(A. 4. 8) 


In arriving at (A. 4. 8), we have made use of the fact that the statistical 
expectation in (A. 4. 7) is taken over the conditional probability distribution 

p(y^lV' 


V. Average Distortion 

Next we consider the set of all error events beginning at i and ending at j 
by defining subsequences 


S(i,jj£[i,j]) I {£(i,jl: s^= s^, = s^, s^, i < k < j} 

(A. 5.1) 


Note that for any subsequence 


£[i,j] e (i.jl^li.jl) 


The distortion at time t=n is 


d((s ,ti ) , (s ,u )) 
n n n n 
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Hence the average distortion between the maximum metric sequence 
the actual sequence ^ time t=n is bounded as follows: 


E {d((s^,u^),(Sj^,Uj^))|s} ^ ^ ^ ^ Pr(s[i,j] = s[i.J] |s[i,j]) 

iin j>n S(i. j |£[i,j]) 

^ P(s[i>j]-»- lU.j]) (A. 5. 2) 

ISn j>n S(i.j|s[l,j]) 

The inequality in (A. 5. 2) comes about because Pr(s^[i,j] = _s[i» j ] |^[i, j ] ) , the 
probability that s[i,j] has the maximum metric of all error event subsequences, 
is less than P(£[i 5 j]-^^[i 5 j]) which is the probability that £[i,j] has a greater 
metric than only that of the true subsequence ^[i,j]. 

In general we are interested in the above distortion averaged over all 
true state subsequences {^[i,j]}. For the special case of convolutional codes 
over S3nnmetric channels, the bound is independent of the particular state sub- 
sequence £[i,j] and a transfer function bound is easily obtained. In the more 
general case of interest here, we should average over all possible true signal 
state sequences. In performing this average, we recognize that any true state 
subsequence represents a first order Markov chain and thus is characterized by 
the probability distribution 

q(£[i,j]) = p(s^)q(u^)q(u^^j) ... q(u^_P (A. 5. 3) 

where p(s^) is the steady state probability of state s^. 

Next we define the set of subsequence pairs 

S(i.j) = { (s^[i, j] ,s[i, j ]) : s^ = s^, s^ = , Sj^ s^; i < k < j} (A. 5. 4) 
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each pair consisting of an error event subsequence and the true state 
subsequence. Then averaging (A. 5. 2) over all subsequences ^[i.j] yields*. 




s ^q(£) 22 2 

8 ISn j>n S(l,j|s[i,j]) 


2 2 2 q(£[l,jl) d((§j^,a^) ,(s^,u^)) P(is(l,jl 

lin J>n S(l.j) 


(A. 5. 5) 


Using (A. 5. 3) and the bound (A. A. 6) in this expression yields the bound 


j-i 




iin j>n S(±,J) 


k-1 


(A. 5. 6) 


Since we have steady state conditions, the above bound is the same for all 
n; that is, it is independent of n. Because of this invariance to time shifts, 
we can express the bound in (A. 5. 6) in another more compact form. Suppose we 
consider two subsequences 

in. j] ,^[i. j] e S(l.j) 

As illustrated in Figure A-8, these two subsequences when shifted by L and 
denoted 


s[i + L,j + L],£[i + L,j + L] e S(i + L,j + L) 


*For simplicity of notation, we shall where convenient drop the dependence of 
£ on 1 and j . 
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n-L n 



Figure A-8. Shift by L 

are also considered in the average distortion provided L satisfies 

i + LSn^j + L 

which are analogous to the conditions on i and j just prior to (A. 4. 3). Also 
note that we have the conditions requisite to being stationary: 

q(£[i,j]) = q(£[i + L,j + L]) (A. 5. 7) 

and 

P(£[i»j] s[i,j]) = P(£[i + L,j + L] s[i + L,j + L]) (A. 5. 8) 
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We can thus include the distortion due to s^[i + L, j + L] and ^[i + L, j + L] at 
time t=n by considering the distortion 


d((§ ,u ^ ) , (s ,u )) 
n— L n-L n-L. n~L 


due to 


s.[i»j] .£[i. j] e S(i, j) 

This means we can replace all shifts of the set 5(i>j) to S(i + L, j + L) by 
including the additional distortion at time t = n - L. 

Thus, for each term in (A. 5. 6) corresponding to a given i, j, and n, we 
can equivalently shift these indices to the left by i, and consider i always 
fixed at zero and j replaced by j - i and n likewise replaced by n - i. Hence, 
the double sum in (A. 5. 6) over the region i i n < j is equivalent to a double 
sum in which the first sum runs over fixed j-i=l, 2, 3, ... and the second 
sum runs over 0^n-i<j-ior0<n-i^j-i-l. Then, letting 
2, = n - i and for simplicity using j to denote j - i, (A. 5. 6) becomes 


oo 


E {d((s ,u ) , (s ,u ))} < 
n n n n 


2 2 

j=l S(0,j) 


^d((s^,u^),(s^,u^» 

J .=0 


P(Sq) 


j-1 

k=0 (A. 5. 9) 


Note that as stated above, we have also set i - 0 in S(i,j) of (A. 5. 6). Also, 
note that, in this form, the bound shows no dependence on the time n. To 
emphasize the independence of n the steady state expected distortion is denoted 

d = E{d((s ,u ) , (s ,u ))} (A. 5. 10) 

n n n n 
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One final step is required to obtain a transfer function bound. Applying 
the identity 


d X 
X = — z 
dz 


(A. 5. 11) 


z=l 


to the sum on I in (A. 5. 9) results in 






il=0 


■j-i 


£=0 


dz 


z=l 


j-1 


dz 


jl=0 


(A. 5. 12) 


I z=l 


Finally, substituting (A. 5. 12) in (A. 5. 9) and noting that the product on k in 
(A. 5. 9) is independent of z, we obtain the desired result, namely. 




d 

z=l 


(A. 5. 13) 


where the transfer function T(z) is given by 


CO 

i-1 

I(Z) =. 2 ^ p(,„) 1 


j=l S(0,j) ‘ k=0 


(A. 5. 14) 
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VI. Evaluation of the Transfer Function 


We now examine the problem of evaluating T(z). First suppose the states 
are given by 

S = {A^,A2, Ajj} (A. 6.1) 

and the signal input alphabet by 


U — {ai»ar\9 ....s a } 

i z in 


(A. 6. 2) 


We next define ’’super states” as elements of S = S x S where 




(A. 6. 3) 


Also define the ’’super signal input” alphabet u = U x Uy where 


LI ••••> ot 2} 

i z m 


(A. 6. 4) 


Then at time t = k we denote ’’superstates*’ 




(A. 6. 5) 


and super inputs 


“k = V " 


(A. 6. 6) 


Next, the super states S are split into two disjoint subsets, namely 


5^ - . 


• • ♦ 


(A. 6. 7) 
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which contains the M equal- component super states 


6^ = = 1.2, M 


(A. 6. 8) 


and 


^'^M+l’^M+2’ 


(A. 6. 9) 


which contains the unequal-component super states. 

With this definition, note that in accordance with 


S(0,j) = {(^[0,j] ,£[0, j]) : Sq = Sq,s^ = Sj.s^ 0 < k < j} 

= {S[0,j]: S^,S. e s/, e 0 < k < j} 


(A. 6. 10) 


where 


S[0,j] = (Sq,S^ ,S ) 


(A. 6. 11) 


Next, we use some shorthand notation where the state equations. 


=k+i “ 




(A. 6. 12) 


are expressed as 


and we define 


\+i = 


P(Sq) £ p(Sq) 


d(Sj^,Uj^) ^ d((Sj^,Uj^),(s^,u^)) 


q(U^) £ q(u^) 




(A. 6. 13) 
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Then the transfer function (T(z) of (A. 5. 14) can be rewritten as 


j-1 


j=l S(0,j) k=0 


d(S ,U ) 


(A. 6. 14) 


Note, that in the above form, T(z) can be interpreted as a transfer func- 
tion for the super state diagram of Figure A-9. Here T(z) is the sum of all paths 

2 

in Figure A-9 each starting with an initial state belonging to S. and terminating 

2 ^ 

in a final state also belonging to S. while all intermediate states are those 
2 ” 

belonging to S .* The transfer function label of the branch from state 6. to 
B i 

state 6. is called a., where 
J 



d(6 ,U) , 

z q(U)D. (6 . ,U) ; if UeU^exists such that 6. = G(6.,U) 

A 1 J ^ 

0; if not (A. 6. 15) 



The transfer function can be expressed in matrix form by defining t^(z); 

2 

i = M+1 , M+2, M as the transfer function from all initial states to the 

2 2 2 . 

single intermediate state 6.eS . Defining the (M -M) X (M -M) matrix 


*For j = 1, there are no intermediate states. 
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(A. 6. 17) 


then intermediate state transfer function vector 



satisfies 

M 

_t(z) = A _t (z) + ^ p(6^.)b^ 
i=l 


(A. 6. 18) 


(A. 6. 19) 


or 


M 

_t(z) = (I - A)"^ ^p(6^)b^ (A. 6. 20) 

i=l 

where I is the (M^-M) X (M^-M) identity matrix. The total transfer function is 




where the superscript T denotes transpose. 

We next consider taking the derivative of T(z) where we denote 


and 


c ! = _1 c . 
J dz j 


b = _i b . 

dz 


(A. 6. 22) 


A' 



(A. 6. 23) 


The understanding here is that the derivative is taken term by term in each 
vector and matrix. Also using the identify 


I = a - A) ^ ( I - A) 


(A. 6. 24) 


we have 


0 = di { (i - A) ^ (i - A) > 

= a - A)“^(- d| A) + (I - A)(-^(I - A)"^| (A. 6. 25) 


or 


TT^i- A) ^ = (I - A) ^A’(I - A) 


(A. 6. 26) 
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Thus, using (A. 6. 26) 


dT(z) 

dz 



■^(I - A) 


-1 



+ 



T 


(I - A) 


-1 




(1 - A)~^V(i - 



(A. 6. 27) 


which enables us to evaluate the bound on d given in (A. 5. 13). This evaluation 
is limited only by the ability to evaluate 


a - A) ^ = i + A + A^ + A^ + (A. 6. 28) 

The complexity of computing A is determined by the number of nonzero elements 
in A. Roughly 2^^ nonzero elements can be handled by a large general purpose 
computer. 

Finally, we note that in most cases of interest the bound given above can 
be reduced by a factor of one half. That is, (A. 5. 13) can be improved to 


T ^ 1 <iT(z) 
^ -2~dT- 


z-l 


(A. 6. 29) 


General sufficient conditions for this factor of one half are presented in 
Appendix B. 
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VII. Special Cases and Examples 


There are special cases where symmetry conditions may allow us to reduce 
the number of ’’super states" which are necessary for the evaluation of transfer 
functions. 

A. Sequence Independence 

Recall that the average distortion given the transmitted sequence £ is 
bounded by (A. 5. 2). In some cases this bound is independent of the actual 
transmitted signal state sequence For such cases, we may pick a convenient 
sequence ^ such as one whose elements are all identical, e.g., 

s® = A- for all k (A. 7.1) 

k 1 

asstiming this is an allowed sequence. Then, for any sequence ^ we evaluate the 
bound on average distortion using as the assumed sequence. Thus, under this 
assumption, (A. 5. 2) becomes 

E {d((s ,u ) ,(s ,u )) |s} 
n n n n 


^12 I d((§n,u^) 

i<n j>n S(i,j 




(A. 7. 2) 


where 


u, = u, for all k 
k 1 

is assumed to yield the sequence ^ . 

Next using the bound of (A. 4. 6), namely, 

. 1=1 

P(£°Ii,jKs[i,j]) i ,(Aj,up) , 

k=i 


(A. 7. 3) 


(A. 7.4) 
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equality (A. 5. 3) with all probabilities equal to unity, and the shift invariance 
property, we have from (A. 5. 9) that 


E {d((s ,u ) ,(s ,u )) |s} 
n n n n ““ 


j-i 


3-1 


j=l S(0,j|s°[0,j]) '-£=0 


k=0 




Z-l 


(A.7.5) 


where 




'I 2 TT 

j=l S(0,j|s°[0,j]) k=0 


d((Sj^,Uj^) , (Aj^ 




(A. 7. 6) 


To evaluate the bound of (A. 7. 5) we need to find the transfer function 
1 

(A. 6. 15) 


Tq(z). Here, we define state transitions from state a^ to state a^ as (see 




(Ajj^,u) , ,Uj^)) ^ ueUexists 

such that 
= g(A^,u) 


lO; if not 


(A. 7. 7) 
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■Then define the (M-1) x (M-1) matrix 
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and 


2.1 


‘3.1 


c = 


^,1 


(A.7.10) 


By analogy with (A. 6. 21) the transfer function Tq(z) is then given by 


T.(z) = c^(I - A) 


(A. 7. 11) 


and its derivative becomes 


dz 


= (c')"(I - 


A) ^b + c^(I - A) ^b' 


+ c^(I - A) V(I - A)"^b (A. 7. 12) 

where the primes again denote differentiation with respect to z. The final bound 
has the form given by (A. 5. 13) with T(z) replaced by Tq(z). Note that here the 

evaluation of the bound involves only the M states defined by the original signal 

' 2 
model whereas in the most general case of the previous section we considered M 

"super states." 

The most common class of examples where the bound in (A. 5. 2) is independent 
of the actual signal sequence is that corresponding to convolutional codes trans- 
mitted over symmetric channels (Ref. 1). For example, consider the binary 
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2 

convolutional code shown in Figure A-lOa. This is a rate r = -j, constraint 
length K = 2 code with input alphabet 

U = {(00), (01), (10), (11)} (A. 7. 13) 

where the first bit of each pair enters the top unit delay and the second bit 
enters the bottom unit delay. The output alphabet is 

X = {(000) ,(001) ,(010) ,(011) ,(100) ,(101) ,(110) ,(111)} (A. 7. 14) 


and the state is given by s = u - for all n, so that 
^ ^ n n-1 

S = U = {A^,A2,A3,A^} (A. 7. 15) 


Next suppose we have a symmetric channel such as that created by a BPSK 
modulated signal with additive white Gaussian noise and soft decision decoding 
(Ref. 1). Here the channel has input alphabet I - {0,1}, output alphabet 
W = (“°°,~), and the channel conditional probability density function p(w|i) 
for each is I, weW given by 

p(w| i = 0) = — ^ exp 
>/27r 



p(w|i = 1) 



(A. 7. 16) 


where E^/Nq is the BPSK pulse energy-to-noise ratio. In our example the 
convolutional code output consists of three binary symbols so that 


X = 


(A. 7. 17) 


and 


y = 


(A. 7. 18) 
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100 






Let us assume a maximvim likelihood metric as in (A. 2. 5a) where now 


3 



n=l 


(A. 7. 19) 


with y^e U and Xj^eX. Letting x^^ denote the binary component vector Xj^ with 
components coverted to ±1 by the rule 


0 1 

1 ^ -1 (A. 7. 20) 


then 


3 



n=l 


(A. 7. 21) 


where I is the ±1 representation of i according to the rule in (A. 7. 20) and from 
(A. 7. 16) 






(A. 7. 22) 


Substituting (A. 7. 22) into (A. 7. 21) and taking the natural logarithm of the 
result in accordance with (A. 2.5a) gives 


- I 2! "kn^ 


3E 
£ 

Nn 


n=l 



3 

5!”kn^ 

n=l 


(A. 7. 23) 
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Since the first three terms of (A. 7. 23) are independent of we can 
equivalently consider the metric 


"((Sk-V-V ■ 1 


W 


I. = (Vk-V 


kn kn 


n=l 


(A. 7.24) 


where (*,*) denotes the usual inner product of real vectors of dimension three. 
In this case, the Bhattacharyya bound of (A. 4. 8) becomes* 



3 

((Sj^,Uj^).(Sj^,Uk)) 

n=l 



(A. 7. 25) 


Substituting (A. 7. 22) into (A. 7. 24), we get 



= exp 



j 


(A. 7.26) 


5^Note, the components of are now continuous random variables and thus the 
sum over in (A. 4. 8) is replaced by integrations over each component. 
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Letting djj(Xj^,Xj^) denote the Hamming distance between x^^ and x^^ or equivalently 
the number of components of x^ and Xj^ which disagree, then 




1 - 


^kn^kn 


n=l ^ 


(A. 7. 27) 


Substuting (A. 7. 26) into (A. 7.25) gives the desired result, namely. 


2 




(A. 7. 28) 


where, furthermore, = 2E^/3 with E^ the energy per data bit. Alternately, 
letting 



(A. 7. 29) 


we can rewrite (A. 7. 28) as 


djT (x, ,S. ) 

Df ((Sk,\).(Sk,Uk)) = D “ (A.7.30) 

2 


For a coded system we are typically interested in the average bit error 
probability. Suppose we consider the distortion measure d((§j^,tij^) , (Sj^»Uj^)) 
that depends only on Uj^ and Uj^ according to the following table; 
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Table A-1 




v> 

00 

01 

10 

11 

00 

0 

3 

a 

a+3 

01 

B 

0 

a+3 

a 

10 

a 

a-l-3 

0 

3 

11 

a+3 

a 

3 

0 


for any a > 0, 3 > 0. By choosing a = 1> 3 = 0» the entries in the above table 
would be one whenever the first bit in and disagree, and zero whenever 
they agree. Thus, the average distortion would give the average bit error 
probability for input bits entering the convolutional encoder at the upper unit 
delay in Fig. A-lOa. Conversely, a = 0, 3=1 results in table entries which 
are one whenever the second bit in and disagree and zero whenever they 
agree. Thus, the average distortion would now give the average bit error 
probability of the input bits entering the lower unit delay of the encoder. 
Finally, a = 3 = yields the total average bit error probability. 

Figure A-11 illustrates the modified state diagram with the initial and 
final states given by state and intermediate states ^ 2 * 

branch labels between states are determined by substituting (A. 7. 30) and the 
entries of Table A^l into (A. 7. 7). By observation of Fig. A-ll, we can 
directly obtain the transition matrix among nonzero states which is given by 


Dz' 




A = 


Dz 


Dz“ Dz“ 


^2 a +3 a +3 ^2 a +6 

D z z D z 


(A. 7.31) 
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and vectors 



Dz^ 


— 1 
o 

1 

b = 

D^z“ 

9 J9: 





i 

CM 

P 

1 


(A. 7. 32) 


Here the average distortion is given by* 


. . 1 

^ 2 dz 


|z=l 


Y + c^(I_ - A)“^A’ (I - A)“\j ^ 


(A. 7. 33) 


where for z = 1 


b’ = 


eo 


aD' 


(a+g)D' 


(A. 7.34) 


and 


3D 

eD^ 

8D 

aD 

aD 

aD 

(a+3)D^ 

a+6 

(a+e)D 


(A. 7.35) 


*The factor of 1/2 is used here as discussed in Appendix B. Also note that for 
this case ® ^ which eliminates the first term of (A. 7. 12). 
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In Figure A-12 we show this bound on d, for the two cases a « 1, & 0 and 

a = 0, 3 ~ 1 corresponding, respectively, to the bit error probabilities of the 
two data bit sequences entering the upper and lower unit delays of the convo- 
lutional encoder. Note that for = 7 there is a factor of 10 difference in 

the bit error probabilities of the two data bit sequences entering the encoder. 

B. Difference Sequences 

In some examples the conditional average distortion bound given in (A. 5.2) 
may depend only on differences, e.g. ““ This allows 

us to define "difference states" rather than general "super states" in evaluating 
the transfer function bounds on the average distortion. Typically the number of 
difference states is much smaller than the number of "super states." 

Uncoded amplitude modulated signals transmitted over a linear channel with 
intersymbol interference and additive white Gaussian noise is a common example 
where only differences are Important. For example, with uncoded BPSK modulation, 
we typically have the equal probable data bits* u^c U = {-1, 1} which after inter- 
symbol interference results in an equivalent discrete-time signal 


V 



i*0 


(A. 7. 36) 


where v is the assumed finite memory of the inter symbol interference and 

hp, hj, .... are expressed in terms of the BPSK pulse rate and channel filter 

causing the intersymbol Interference. The state is defined as the vector 


*Here it is convenient to use {-1, 1} rather than {0, 1}. For simplicity, 
however, we shall omit the overbar notation on Xj^. 
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'^-1 

V2 



(A, 7. 37) 


and the filter vector is given by 



h = 


(A. 7. 38) 


Then, the signal has the form 


\ ■ + >'o\ 


(A. 7. 39) 
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where ( •,• ) is again used to denote the inner product of vectors. The state 

is obtained from shifting state and adding component u^, i.e., replacing 
k by k + 1 in (A. 7. 37). 

The channel output is given by 

Yk = \ \ (A. 7.40) 

where {n, } is an i.i.d. sequence of zero mean Gaussian random variables which 

1C 

are normalized to have unit variance. We use the natural maximum likelihood 
metric of (A. 7. 24) which results in the Chernoff bound becoming the Bhattacharyya 
bound 


2 


/p(yklVP^\IV 



= exp [- -|(x^ - 

= exp I - + hQ(Uj^ - \) ]^ j (A. 7. 41) 

We are typically interested in bit error probabilities so we use the 
error distortion measure of (A. 2. 6a) which can be rewritten in the form 

d((\*\)’(Sk>Uk)) 


0; = Uk 


\ \ (A. 7. 42) 
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Note that here both the metric and the Chernoff bound depend only on the differ- 
ences u. - u, and s, - . We now examine the transfer function T(z) given by 

k k k k 

(A. 6. 14), which upori substitution of (A. 7. 41) and (A. 7,42) becomes 

“ “(u -u )^ ( 2 ) 

j=l s(0,j) k=0 ' 

(A. 7.43) 

To evaluate this, we now take advantage of the fact that only differences 
occur by defining 


\=i^“k-V (A.7,44) 

which takes on values {-1, 0, 1} and the difference state 

\=j(Sk-\) (A. 7.45) 

Then, the difference state would be obtained by shifting 6 ^ and adding 

component Here there are 2^ possible values of the state s^ while there 

are 3^ possible values of the difference state 6^^. Recall that the "super 
states," consisting of pairs would have 4^^ possible values. 

With the difference formulation and the fact that equally probable bits 

means 


q(Uk> 


(A. 7.46) 


we have from (A. 7.43) that 




T(z) = ^ J p(Sq) JJz exp 

j=l S(O.j) 




k =0 


(A. 7. 47) 
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Recall that the sum. over S(0,j) consists of all pairs s[0,j] and s[0,j] such 
that 


§ 


0 



(A. 7. 48) 


and 


Sj^ ^ ••••> j 1 


or, equivalently, a difference sequence 


where 


and 


•S[0,j] — (6q,6j^, ••••, 6j) 


0 

0 




# 0; k = 1 ,2, . . . . , J-1 


(A. 7. 49) 


(A. 7. 50) 


(A. 7. 51) 


(A. 7. 52) 
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Note that there are 2^ choices of initial conditions = Sq and thus 
P(Sq) = 1 / 2 "^ whereas there is only one choice of initial condition for 6^. 
Also note that the error sequence does not uniquely specify the pair 

sequence since 


Ek = 0 


when Uj^ = 1 , u^^ = -1 


when Uj^ = - 1 , Uj^ = 1 



(A,7.53) 


Thus, if we replace the sum over S(0,j) by the sum over all difference state 
sequences 


0(0 


.j)=|«[0.j]: 6 q = 0.6j = 0,6k 0;k = 1,2 j <^,7.54) 


then we must also replace p(Sq) = 1/2^ by one and q(^) = 1 /2 by 


2’ ^k “ ^ 

i’ ^k = 

1; ^k = 0 

2 

<=<'k> - (i) " 

Note that (A. 7. 55) takes into account the fact that Uk can be +1 

e, =0. 

k 



(A. 7. 55) 


(A. 7. 56) 


or -1 when 
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Thus the transfer function of (A. 7. 47) takes on the new form 


00 

“ S 5! IT (t) {" Vk^^j 

j=l P(O.j) k=0 ' ^ 


(A. 7. 57) 


To evaluate the transfer function T(z), let the set of difference states be 


V "■ ••••» ^L— 1 ^ 


where L - 3 . Next define (see (A. 6. 15)) 


(A. 7,58) 


and 


(!) 


a , , 
13 


exp |- ■i[(h,dp + if can be reached 

^ ' from d^ with some G 


0; if not 


(A. 7. 59) 


A = 


‘11 


12 


‘21 


‘22 


®1,L-1 ^2,L-1 


L-1,1 


‘L-1,2 


^-1,L-1 
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b 


^01 

^02 


^0,L-1 



(A. 7.60) 


Then (see (A. 7. 11)) 

T(z) = - A)'^b (A. 7. 61) 

Consider the example where v = 1 so that we only have h^ and hj^ and the 
difference states, 6^^ = 


do = o 


dl = 1 


d, = -l 


(A. 7. 62) 


Figure A-13a shows the difference state diagram with £ as branch values while 
Figure A-13b shows the transfer function difference state diagram with a^^ as 
branch values. Thus 
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1 



a) Difference State Diagram 
b 



b) Transfer Function Difference State Diagram 


b + c 



Figure A-13. Example with v = 1 
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A = 


b c 


c b 


b = 


; c = 


(A. 7.63) 


Substituting (A. 7. 63) into (A. 7. 61), we then have 


T(z) = 


z exp 

2ad 

1 1 
" 2 I 

("o >’?)_ 


1 - (b + c) 

1 - z exp 

1 

"2 

fo *■?) 

, ^ (A. 7.64) 

cosh (h^hj) 


and the bit error probability bound 



(A. 7. 65) 


We can compare this result with the no intersymbol interference case. 
This corresponds to conventional BPSK with bit error probability* 


= Q {jhl + hj) < \ exp 1^- |(h2 + 


(A. 7. 66) 


/* 00 

if 2 

*Q(x) = I exp (-y /2)dy is the usual error probability integral. Also 

Jx 


we have normalized both cases to have the same energy. 
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where this bound on Q(x) is within 0.5 dB for 


< 10 


-2 


For the special case 


h 


0 



h, - — ^ (A.7.67) 

* 42 

Figure A-14 illustrates the bound on given by (A. 7. 65) and the bound on P^ 
given by (A. 7. 66). For large values of difference is asymptotically 

equal to zero. 

Another possible comparison is with a conventional single sample data 
detector which makes no use of the energy in the Intersymbol interference to 

improve performance. Here, the average bit error probability is simply given 
by 


** 1 

K = oQOlr 


hp 


+ 2Q(tiQ - 


hf) 


< i exp [- |(hQ + hj)^] + { exp [- |(hQ - h^2j 
= Y exp y(hQ + hj)j cosh (hph^ (A. 7. 68) 


This result is also illustrated in Figure A-14. Notice how the Viterbi 
algorithm has been successful in combating intersymbol interference. 


Several other examples of intersymbol interference channels and their 
analysis are given in Ref. 1. There, continuous- time signals are reduced to 
equivalent discrete-time signals and the corresponding transfer function bounds 
as in (A, 7. 57) are derived. In these examples one can see further that for a 
difference state sequence as in (A. 7.50) with corresponding error sequence 


^ 0 ^ 


• I there is an equivalent difference state sequence 
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BIT ERROR PROBABILITY 
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> • • • • 5 


(A. 7. 69) 


-6[0,j] = (- 6 q, -S 



with corresponding error sequence -Eq* ••••» Both of these have 

identical transfer function values 


Hd) 


2 


k=0 


exp 




This means that all nonzero difference states can be merged with their opposite 
sign state resulting in a reduced state diagram consisting of 3^/2 nonzero 
states (see Figure A-13c for our example) . 

Next we shall consider an example where the number of states necessary to 
compute the transfer function bound is actually less than the number of signal 
states l5|. 

C. Absolute Difference Sequences 

We examine here another problem where ^absolute difference states*’ are 
used in the transfer function bound. In particular, we consider a phase esti-- 
mation problem where we quantize the phase space ( 0 , 2ir) into M values 

S = {Aj, A^} (A. 7. 70) 


where 


A^ = kA ; k = 1 , 2 , M 

(A.7.71) 


The phase sequence is Sq, s^, 82 * ... which we model as 

^ .... 


(A. 7. 72) 
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where the initial phase random variable Sq has the probability 

p(Sq) = all SpeS (A.7 .73) 

and st“e i.i.d* random variables with common probability. 

u = -A,0,A 

q(u) = I 

\0; otherwise (A. 7. 74) 

Here the symbol 0 denotes modulo 2ir addition. Thus, at any point in the 
sequence, the phase may either remain the same or take on one of its two 
adjacent values all with equal probability of occurrence. 

The actual signal is assumed to be the sine and cosine of the phase, i.e.. 


cos (Sj^@U^) 



sin (Sj^ © Uj^) 


_8l„ Vl. 


(A. 7. 75) 


TT 

Pigure A-15 shows the state diagram for this signal for M 8 and A « The 
branch values are the signal inputs ueti. 


Suppose the 
to each component 


channel adds zero mean independent Gaussian random variables 
resulting in the channel output vector 






(A. 7. 76) 


where 


a = 



(A. 7. 77) 
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and 



(A. 7. 78) 


with and n^^ having unit variance. Also assume that the receiver uses the 
maximum likelihood metric analogous to (A. 7. 24) namely 

"K-V-^k) ' (A.7.79) 

and the squared error distortion measure, 

d((s^.Uj^).(Sk.Uk)) =mln } ^^+1 © ®k+P ^ 1 (A.7.80) 

where 0 denotes the difference modulo 2ir. Note that this distortion depends 
only on the absolute difference between Sk_j_j^ and Sk^j^ (modulo 2tt) which has 
values in 

V = {Aq.Aj^ ^M/2^’ Aq = 0, M even (A. 7. 81) 

Furthermore let <S^ be defined as this absolute difference, namely, 

\+l • y<i((vV-<=k>V) (A.7.82) 

which as stated above has values only in P. 

Substituting (A. 7. 75) into (A. 7. 76), then the metric of (A.7.79) is 
evaluated as 

(yk.\> - a ao3 + li^ (A.7.83) 


A-57 


Is a zero mean unit variance 


where A cos sin s^^^ 

Gaussian random variable. Note that 

cos (S^J - . cos © 5^^j) 

- <\+l © 

■ '=°" [y'’('®k-V-<=k-"k>)] 

= cos (A. 7. 84) 

Hence, both the distortion measure and the metric depend only on the absolute 
differences 

Since we use a maximum likelihood metric the Chemoff bound results in 
the Bhattacharyya bound 
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Thus in addition to the metric, the Chemoff bound also depends only on the 
absolute differences A.s in the previous example, define an error terra 

for each k by* 


I 


= Uj^ - u^ e {-2A,-A,0,A,2A} 


Then, an absolute difference process can be given by 


6 


k+1 



IT - A, IT, all 
= IT - A, IL 


6^ ® IT - A, = 2A 


IT, all Cj^ 


(A. 7. 86) 


(A. 7.87) 


We now consider a transfer function bound for this problem. The general 
transfer function bound T(z) given by (A. 6. 14) which when using (A. 7. 82) and 
(A. 7. 85). i.e.. 


2 

^.d ((\.\) . (Sj^.up) . (s^^.u^) ) = exp \ -^[l - 


cos 6^^^] j 


(A. 7. 88) 


has the form 


T(z) 


S 2 7T^ {“ I (A. 7.89) 

j=l 5(0.j) k=0 ' ' 


Note (A. 7. 86) is analogous to (A. 7. 44) except for a factor of two. 
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Recall S(0,j) is the set of sequences s[0,j] and s[0,j] that diverge at the 
initial node and remerge j branches later in the trellis diagram. This corres- 
ponds to a particular absolute difference state sequence as in (A. 7. 50) where 
now 


= 0 ; k = 1 , 2 , j - 1 


(A. 7. 90) 


We replace the sum over S(0,j) by the sum over all absolute different state 
sequences as in (A. 7. 54). Next, we replace p(Sq) by one and hy a function 

c(s^) whose definition we shall now examine by considering the values of Uj^ 


associated with each value of e, as follows: 

k 


e, = 0 when 
k 


“k ■ " • “k ■ ° 
"k ■ * • “k ■ 
“k ■ \ 


e. = A when 
k 


“k ■ 0 • \ 

"k ■ * • \ “ 


= -A when 


Uk - -1. Uk - 0 

“k ■ “ • \ ‘ ‘ 


= 2A when Uj^ = A , u^^ = -A 


= -2A when u^^ = -A, u^^ = A 


(A. 7. 91) 


From this we have 


1 : - 0 


o(Ck) • <; 3 : - lA 


V *^k ■ 


(A. 7. 92) 
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where we count the number of distinct values of Uj^ for each and multiply by 
q(u^) = -j as in (A. 7. 7 4). 

The final transfer function has the form 



(A. 7. 93) 


Note that (A. 7. 93) can be evaluated using only M/2 states which is even less 
than the number of states required for the Viterbi algorithm. 


Now we define the branch transfer functions 


a. , 

13 


z exp 




cos A.]>; if state A. can be 
3 y 3 

reached from state A. 


. 0; if not 


(A. 7. 94) 


where c^^ is the sum of the numbers c(e) corresponding to all error inputs e 
that can cause* a transition from state A^ to state A ^ . Then defining 


Note that in (A. 7. 87), we see that e = A or e = -A can cause a transition from 

''k “ “ \+i = *1 = 4- 
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^ 2,1 


^11 ^21 

^12 ^22 ^/ 2,2 


A = 


^l,M/2 ^2, M/2 


^/2,M/2 


(A. 7. 95) 



®01 


”^10 


^02 


®20 

b = 

• 

; c = 



• 

^0,M/2 


^/2,0 


(A. 7. 96) 


we may use the simple transfer function of (A. 7. 61). 

For the M = 8 example of Figure A-15 we have 

V = {0,Aj,A2,A3,A^} (A. 7. 97) 

and the state diagram for the absolute difference process given by (A. 7. 87) is 
illustrated in Figure A-16. Here the branch values are the input errors of 
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0 



Figure A-16. Absolute Difference State Diagram 
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(A. 7. 86). The corresponding transfer function absolute difference state diagram 
is- shown in Figure A-17. Here we have the branch transfer functions given by 


^01 V 


^43 ®42 =|“<V 


a^l = 3a(A^), a 22 = 01(42) 


^33 = 3“S^‘ ^4 ' “^V 




=3Qi(A^_j); 1 = 1,2, 3, 4 


“1,1+2 ° 3“<*l+2>* ^ 


“ 1 . 1-2 ■ 1 “<‘l- 2 ’' ^ 


(A. 7. 98) 


where 


a(A) = exp |- "^[l - cos Alj (A. 7. 99) 

In arriving at (A. 7. 98), we have made use of the fact that when two values of 
can cause the same transition between states, then the branch functions 
corresponding to the different values of can be added together to form a 
single branch function. Thus, for this example, we have from (A. 7. 44) and 
(A. 7. 96) that 
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e Transfer Function State Diagram 
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0 


and 


A = 


^11 

®12 

^13 

0 


®21 

^22 

®23 

®24 


^31 

^32 

^33 

®34 


^2 

®43 

^44 


(A. 7. 100) 


^01 


^10 

®02 


®20 


• c = 

> _ 


0 


0 

1 

O 

1 


1 

o 

1 


(A. 7. 101) 


Figure A-18 illustrates the mean square phase error bound, as computed 

from (A. 5. 13) together with (A. 7. 61), (A. 7. 100) and (A. 7. 101) using the maximum 

likelihood Viterbi algorithm which in this application is basically a smoothing 

algorithm. This is shown for M = 8 as a function of the signal energy-to-noise 

ratio E /N^^, For large E /Np^ the remaining error is due to quantization of the 
s 0 s t) 

2tt interval into M quantized values. 
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APPENDIX B 


A Factor of One Half in Error Probability Bounds 

I. Introduction 

In many complex communication systems, error probabilities are often 
difficult to evaluate, and thus, easily computed bounds are highly desirable. 
Two such bounds are the Chernoff bound and the Bhattacharyya bound (Ref. 1). 

For any error probability bound, one desires that it be as tight as 
possible. Jacobs (Ref. 2) gave sufficient conditions for reducing the standard 
Chernoff bound by a factor of one half. In this appendix, we rederive this 
result and give less restrictive but harder to verify sufficient conditions. 

We also present some related results of Heilman and Raviv (Ref. 3) which show 
that all Bhattachar 3 ; 7 a bounds can be reduced by a factor of one half. 

II. Decision Function and Error Probability Models 

Let Z be a continuous random variable that can have one of two 
probability densities: 


Hj^ : f ^ (z) , -00 < z < oo 

^2 ’ -00 < z < oo (B.2.1) 

where the a priori probabilities for these two hypotheses are denoted by 

7T^ = Pr{H^} and = PrCH^} = 1 - (B.2.2) 

We assume an arbitrary deterministic decision rule characterized by the following 
binary-valued decision function: Given an observed value z of the random 

variable Z, then if 


(j)(z) = 1, decide (B.2.3a) 

and if 

(f>(z) =0, decide H 2 (B.2.3b) 
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In terms of this decision function, we have conditional error probabilities 


P = Pr {decide H21 h^} 


■r 


[1 - (f>(z)]f j(z)dz 


(B.2.4) 


and 


Pg = Pr (decide 


■/: 


<{i(z)f2(z)dz 


(B.2.5) 


The average error probability is 


^2^E2 


■f 

oo 


(z)[l - (Kz)] + ir 2 f 2 (z)<Kz)}dz (B.2.6) 


In the following, we examine Bhattacharyya and Chernoff bounds for various 
decision rules. 


III. Maximum A Posteriori (MAP) Decision Rule 

The decision rule that minimizes Pg is the MAP rule. 



which satisfies the inequalities 


<J>(z) 


< 


a 


1 - <^(z) 


< 


$ 

frif^Cz) 


(B.3.2) 


(B.3.3) 


for any a > 0, 3 k 0. These inequalities are typically used in (B.2.4) and (B.2.5) 
to derive the bounds 



3 


r 

TT2f2(z) " 

1 

ir^f j (z) 


(z)dz 



(B.3.4) 


and 




TTjf ^(z) 



(z)dz 



(B.3.5) 


Next define the function 


B(a) 



f^(z)“f 2 (z)^ “dz 


(B.3.6) 
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Then from (B.2.6), the average error probability has the upper bound 


Pg < TTji ^"^•rr2^B(l-g) + Trj“ir2^”°'® (B.3.7) 

for any a k 0, g > 0. In general we would choose a and 0 to minimize the bounds 
of (B.3.4) and (B.3.5). The special case where 

a = e = y (B.3.8) 


results in the Bhattacharyya bound 


Pg < B(j) 


5 B(i) 


-/ '/h 

J^oo 


(z)f2<z)dz 


(B.3.9) 


Since 


~ 2 


(B.3.10) 


In most cases of interest, such as when’*^ 


fj,(z) = f 2 (“z) for all z 


(B.3.11) 


we have a = minimizing the function B(a). 


*When fj(z) and f 2 (z) are conditional probabilities of a communication channel 
model, this is usually the case. 
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Let us now reexamine the general form for the average error probability 
using the MAP decision rule. Note from (B.2.6) and (B.3.1) that 


P 


E 



{■FT^f j^(z) [1 


(|)(z)] + Tr2f2(z)<}'(z)}dz 



mln{7rj^f j(z) ,TT2f2(z) Hz 


Following Heilman and Raviv (Ref. 3) we note that for any a > 0, b 
0 < a < 1 we have 


min{a,b} < a%^ 

This yields the upper bound on the average error probability 



[Trjfj(z)]“[7r2f2(z)]^““dz 


a 

•FT, TT, 


1-a 


B(a) 


Since the minimizing choice of a is in the unit interval [0,1] then 
is always a factor of one-half smaller than the bound given in (B.3 
ticular for the Bhattacharyya bound where ^ this bound, due to 
Raviv, is always a factor of one half smaller, i.e.. 


P 


E ^ 



f^(z)f2(z)dz 


(B.3.12) 
0 and 


(B.3.13) 


(B.3.14) 

this bound 
7) . In par- 
Helltnan and 


(B.3.15) 
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Thus, the conimonly used Bhattacharyya bound, particularly In Its application to 
deriving transfer function bit error probability bounds for convolutional codes, 
can be tightened by a factor of one-half . 

IV. Maximum Likelihood (ML) Decision Rule 
The ML decision rule, namely. 


/I, fj(z) > f2(z) 

4>(z) = I 

(o, f^(z) < f 2 (z) (B.4.1) 

tends to keep both conditional probabilities closer in value but only minimizes 
Pg when TT^ = 172 = Y’ equal a priori probability case. In general, we 

have inequalities 


and 


<j)(z) 


< 


f J^(z) 

.Wj 


(B.4.2) 


[1 - (j)(z)] 


< 


e 

f2(z)‘ 

f j^(z) 


(B.4.3) 


resulting in conditional error bounds 

P^ < B(l-3) 


(B.4.4) 


and 


P„ < B(ct) 

h 


(B.4.5) 



The average error probability is simply bounded by 

< 7 TjB (1 - B) + ir 2 B(a) (B. 4 . 6 ) 

While the choice a = 3 = which often minimizes this bound yields the usual 
Bhattachar3^a bound 

Pg < B(|) (B. 4 . 7 ) 


since = 1 . 

Again using the inequality (B. 3 . 13 ), we can show a tighter bound as follows 


‘f 

•/ _00 


(z)[l - 4 )(z)] + ir2f2(z)<J>(z) Hz 


r 

LTrj^,7r2}| {fj 

•A—oq 


^ max{TT^, 772)1 + f 2(2) <}>(z) }dz 


= max{ 


r 

iTj,ir2}| min{f j(z) ,f2(z) }dz 

•/—oo 


< max{iTj^ ,tt 2}| f (z)“f 2 (z) ^ °^dz 


= max{ IT ,it 2}B (a) 


(B. 4 . 8 ) 


for 0 < a < 1 . For the case where 


IT, = 


'"2 ~ 2 


B -7 


and a = we again reduce the bound of (B.4.7) by a factor of one half. Most 
cases of interest have equal a priori probabilities - 

V. Maximum Metric and Chernoff Bounds 

We now assume that Z is some sort of metric used to make the decision such 
that for the particular value Z = z, we have the rule: 

If 2 > 0, choose 

If 2 < 0, choose H 2 (B.5.1) 


The decision function is then 


4»(z) = 


1 » 2 > 0 


0 , 2 < 0 


and conditional errors are 




Pg = j [ 1 - 4»(z) ]f ^(z)dz 


■f'. 


(z)dz 


(B.5.2) 


(B.5.3) 
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and 


■f 


Pg =j i>iz)f^iz)dz 


■f 

•'A 


f2(z)dz 


(B.5.4) 


For a > 0 and B > 0 we have the standard Chemoff bounds 




Pg e““^f j(z)dz 4 Cj(a) 


(B.5.5) 


and 


{ 


Pg e^^f2(z)dz 4 € 2 ( 3 ) 


(B.5.6) 


Thus, the average error probability has the upper bound 


Pg ^ TTjCj(a) + ^202(3) 


(B.5.7) 


Note that in general if Pgj^ and PE2 are less than 0.5 then the Chemoff bounds 
are minimized by nonnegative parameters o and 3. Hence the Chemoff bounds 
apply for all real values of a and 3. 

Jacobs (Ref. 2) considered the conditions 

fj^(-z) > fj^(z) all z < 0 (B.5.8a) 
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and 


f 2 (-z) i f 2 (z) z > 0 


(B.5.8b) 


Then, using the inequality 


UJ -U) 

— — ^ — = cosh (0 


> 1 for all (j) 


(B.5.9) 


and appropriate changes of variables of integration, he showed the following 
inequalities ; 


a 


C,(a)-| e “^fj(z)dz 


•f 


0 r® 

-az^ X . .1 -az. 


j 

•'0 


e~ fj^(z)dz +1 e fj^(z)dz 


= f e~“^f j^(z)dz + f e“^f j^(-z)dz 
•'—00 •'—00 

>f e~“^f J^(z)dz + /* e“^f^(z)dz 

_oo —00 


-r 


cosh az fj^ (z)dz 


•^ oo 


i 2 I f, (z)dz 


(B.5.10) 


B-10 



or 



< |Cj(a) 


Similarly, it can be shown that 


i I CjCS) 


(B.5.11) 


(B.5.12) 


Thus the often satisfied condition given by Jacobs in (B.5.8) results in a factor 
of one half in the usual Chemoff bounds. 

Less restrictive but more difficult to prove conditions are that 


and 




(z)dz 



e“*^f j(z)dz 


(B.5.13a) 


0 

e^*^f2(z)dz 



e ^ ^f 2 (z)dz 


(B.5.13b) 


where a* minimizes Cj^(a) of (B.5.5) and 3* minimizes € 2 ( 3 *) of (B.5,6), Note 
that, for the special case of a* = 0, we have 


f'. 


(z)dz = 1 - P, 


, .f .. 

•'—00 


(z)dz » P- 


which is always satisfied when 



< 


I 

2 * 


(B.5.14) 


(B.5.15) 
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Similarly for $* « 0, we would have 


[ 


f2(z)dz = 1 - Pg >1 f2(z)dz = Pg 

00 ^ ^ 


(B.5.16) 


which is always satisfied when 



(B.5.17) 


Indeed conditions (B.5.13) are also true for some nonnegative range of a* and 
B* values. We assume it is true for the minimizing choices of the Chemoff bound 
parameters. Note that conditions (B.5.13) are less restrictive than those of 
(B.5.8) since the latter imply the former but not vice versa. 

Now consider the inequalities 


Cj^(a) > Cj(a*) 


nr> 


(z)dz 


■£ 




e"“*^fj(z)dz + j e"“*^f^(z)dz 


e ^ ^fj^(z)dz +J" 


(z)dz + I j(z)dz 



cosh a*z fj^(z)dz 
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f j^(z)dz 


> 2 




or . 


’Ej - 2 


Similarly 


< 2 ^2(6)- 


Thus, since (B.5.19) and (B,5.20) are identical, respectively, to ( 
(B.5,12), we have shown that the less restrictive conditions of (B. 
in a factor of one-half in the usual Chernoff bounds. 

Next for the special case where 

■^1 " "^2 “ I 


and 


a* = e* 


sufficient conditions can alternately be given by 


f 


e"“*^f,(z)dz > 


— rv* 


a^z 


\ 


(B.5.18) 


(B.5.19) 


(B.5.20) 

B.5.11) and 
5.13) result 


(B.5.21) 


(B.5.22) 


•'0 


f2<z)dz 


(B.5.23a) 


and 




(z)dz 



e“*^f j^(z)dz 


(B.5.23b) 


Note that these conditions are always satisfied if our decision rule is a 
maximum likelihood decision rule where 


f 2 (z) < fj^(z) for all z > 0 


(B.5.24a) 


and 


f^Cz) > fj^(z) for all z < 0. (B.5.24b) 

Assuming conditions (B.5.23) we have 


C^(a) + C 2 (B) > C^(a*) + C 2 (a*) 


r 

I -a*z, 

-J e f, 


(z)dz 




+ e“ "f2(z)dz 


■r 


{ 

•'0 


e *^*^fj^(z)dz + e “*^fj^(z)dz 


1 


0 

+ 1 e“*^f ^ (z)dz + j e“*^f 2 (z)dz 


.fe- 
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-a*Z£ 
e f 


Jn 


f2(z)dz 


a*Zf 

e £ 


^(z)dz+f e“* = 


£2(2)62 


= 2 



cosh a*z £j(z)dz 



cosh a*z £ 2 ( 2)62 


> 2 P + 2 P„ (B.5.25) 

E, E„ 


P = — P . + — P 

2 ^E^ ^ 2 E 2 

< i Cj(a) + 02 ( 6 ) (B.5.26) 


which is again a factor of one half less than the original Chernoff bound on the 
average error probability (B.5.7) for '’fj “ '"^2 * T* 

For the special case where Z happens to be a maximum likelihood metric, 

i.e. , 


fj(2) 


z 

e or z 


In 


f j( 2 ) 


(B.5.27) 
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then the conditions (B.5.23) hold whereupon 


Cj(a) 



e"“^fj(z)dz 



f2(z) 

f j(z) 



(z)dz 



fj(z)^ “f2(z)“dz 


= B(1 - a) 


(B.5.28) 


and 





e^^f2(z)dz 


■n 


fl(z) 

Tm 


(z)dz 



fj(z)^f2(z)^ ^dz 


= B(e) 


(B.5.29) 
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where B(*) is defined in (B.3,6); Recall that B(y) is the Bhattachar)7ya bound. 
Thus, (B.5.28) and (B.5.29) together with (B.2.6), (B.5.19), and (B.5.20) again 
show a reduction by a factor of one half in the bound of (B.4.6). 

VI. Applications 

In most applications of interest, we consider two sequences of length N, 

2Li>i2^X^ 


that can be transmitted over a memoryless channel with input alphabet X and 
output alphabet V and conditional probability 


P(ylx); xeK^yeV 


This is shown in the following figure. 


xeX 

Memoryless 

Channel 

yeV 


p(y|x) 



N 

PN(zlii> “TTp^^inl^in^’ ^ 

n=l 


y 


Figure B-1. 


A Simple Example - One of Two Sequences 
Transmitted over a Memoryless Channel 


The receiver obtains a sequence 




,N 


(B.6.1) 
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from the channel and must 


decide between 




^ 2 ' ^2 


the two hypotheses 
is sent 


is sent 


(B.6.2) 


which have a priori probabilities given by (B.2.2). The receiver will typically 
use a metric 


m(y.x); xeX.yey 

and the corresponding decision rule where, if and only if 

N N 

n=l n=l 


(B.6.3) 


do we choose Hj. By defining the random variable 

N 

Z = ^ tm(y^,x^^) - m(y^,X2^)] 
n=l 

we have the basic problem considered in previous sections. 

For M sequences of length N denoted Xt» ^2’ have the decision 

N 

rule: Given choose the sequence Xm that has the largest total metric 

N 

n=l 

forx. = (x^i.x. 2 . ^ (B.6.5) 


B-18 



The union bound for each conditional error probability is 


Pg = Pr{error| 2 ^} < V Pr{decide m = 1,2, ..., M. (B.6.6) 

ra ^ ' 


Here we have 


Pr{deciding x-'lx } < p(x -»-x-) 
® -m'— m '-m — m 


(B.6.7) 


where P(x ->x^) is the probability of deciding x^ when x is sent assuming x and 

— m — m ° -m — m ® — m 

X- are the only two possible sequences. That is. 


— m —m 



[m(y ,x. ) 
n mn 


- m(y ,x )] > 0 
n mn 



(B.6.8) 


which is the two hypothesis error probability. Thus, in this case, we can apply 
our two hypothesis results discussed earlier. 
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